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Abstract 


Techniques  for  the  estimation  of  unknown  additive  trends  present  in  the  state  and  mea¬ 
surement  processes  of  a  Kalraan-Bucy  linear  system  are  introduced.  We  obtain  asymptotic 
results  describing  the  performance  of  the  estimators  under  i.i.d.  and  periodic  observation 
schemes.  The  observed  process  is  given  by  dY{t)  =  9{i)dt  +  dZ{t),  where  Z  is  the  mea¬ 
surement  process  and  g  is  an  unknown  tr<»ud  function,  and  there  is  an  additiv<»  trend  / 
present  in  the  state  process  X.  These  two  cases  need  to  be  treated  separately  in  order 
to  ensure  identifiability.  The  problem  is  to  estimate  /  and  g,  and  remove  them  from 
the  measureh^ijnt  process.  Trend  removal  involves  replacing  /  and  g  in  the  Kalman  filter 
X{t)  =  ) — based  on  observation  of  T — by  appropriate  estimates.  We  show  that 

this  can  be  done  under  the  following  observation  schemes:  (I)  n  i.i.d.  replicates  of  1'  over  a 
fixed  interval  [0,T],  (II)  observation  of  a  single  trajectory  of  Y  over  a  long  interval  [O.nT]. 
where  /,  g  and  the  functions  defining  the  linear  system  are  periodic  with  period  T. 


/ 


1.  Introduction 


Consider  a  linear  stochastic  system  of  the  type  introduced  by  Kalman  and  Bucy 
(1961):  A  p-dimensional  “state”  process  X  zind  a  9-dimension2J  “measurement” 
process  Z  are  given  by  the  stochastic  differential  equations 

dX(i)  =  A(t)X(i)dt  +  Bit)uit)dt  +  dW(t) 
dZ{t)  =  C(t)Xit)dt  +  dV{t) 

0  <  t  <  r,  where  W  and  V  are  independent  p  and  5- dimensional  \\'iener  pro¬ 
cesses,  u(-)  is  a  known  deterministic  input,  A,B,C  are  known  non-random  time- 
varying  matrices  of  suitable  dimensions,  X(0)  is  independent  of  and  V,  the 
mean  £'(^A'(0))  =  m  zind  covariance  matrix  of  A'’(0)  are  known,  and  Z(0)  =  0.  The 
Kalman  filtering  theory  provides  recursive  formulae  for  the  conditional  expectation 
X{t)  =  E{X{t)\Xf)  which  is  the  optimal  mean  square  estimate  of  the  state  A’(t) 
given  the  past  Xf  =  (r{Za,0  <  s  <  t)  of  the  measurement  process,  see  Liptser  and 
Shiryayev  (1978)  and  Kallianpur  (1980). 

In  real  applications  of  the  Kalman  filter  to  signal  processing  it  is  often  found 
that  unknown  additive  trends  are  present  in  the  state  and  measurement  processes; 
that  is,  the  state  process  A  is  given  by 

dXit)  =  fit)  dt  +  A(t)  A(t)  dt  +  B{t)  u{t)  dt  +  dW{t)  (1) 

and  instead  of  observing  Z,  we  observe  the  process  Y  given  by 

dYit)=^g(t)dt  +  dZ{i),  y(0)  =  0,  (2) 

where  /  and  g  are  unknown  “trend”  functions. 

Ill  the  present  paper  we  shall  consider  the  problem  of  estimating  the  trends  / 
and  g  and  removing  them  from  the  measurement  process.  Trend  removal  amounts 
to  replacing  the  functions  /  and  g  used  in  the  Kalman  filter  A(t)  =  E{X )— 
based  on  observation  of  Y — by  appropriate  estimates  /  and  c. 

Two  types  of  observation  scheme  are  considered; 

(I)  77  realizations  {y,(t),  t  G  [6,T],  i  =  1,...,7?)  of  the  process  V'  satisfying  (1) 
and  (2)  with  the  corresponding  system  realizations  having  independent  noise 
processes  IT,  and  K,  ,  i  =  1, . . . ,  77. 

(II)  observation  of  a  single  trajectory  of  Y  over  the  interval  [0, 77T],  where  the 
functions  f,g.A,B  and  C  are  periodic  with  period  T. 

Observation  scheme  (II)  is  relevant  to  situations  where  there  is  a  "time-of-day" 
or  “seasonal”  effect  present  in  the  model;  for  example,  in  the  analysis  of  circadian 
rhythm  data  in  biology,  or  in  the  study  of  cyclic  systems  in  control  engineering- 
see  the  rpview  article  of  Bittanti  and  Guardabassi  (19S6).  We  are  interested  in  the 
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asymptotic  properties  of  estimators  of  /  and  q  as  n  —*  oo  with  T  remaining  fixed. 
We  shall  see  that  f  and  g  are  not  identifiable  unless  one  of  them  is  absent  from  the 
model  (i.e.  /  =  0,  ^  ^  0  or  /  5^  0,  5  =  0). 

There  is  a  vast  literature  on  the  estimation  of  finite  dimensional  parameters 
in  discrete  time  linear  stochastic  systems;  refer  to  the  books  of  Davis  eind  Vinter 
(1984)  and  Kumar  and  Varaiya  (1986).  In  continuous  time  such  problems  were  first 
studied  by  Baleikrishnan  (1973).  Further  contributions  have  been  made  by  Bagchi 
(1980),  Tugnait  (1980)  and  Bagchi  and  Borkar  (1984).  Nonparametric  estimation 
for  linear  stochastic  sj’stems  is  considered  to  be  a  difficult  problem;  see,  for  instance, 
the  closing  comment  of  a  recent  paper  of  Aihara  and  Bagchi  (1989).  In  general  the 
functions  A,  C,  f  and  g  are  not  even  identifiable.  In  the  present  paper  we  are 
studying  the  very  special  case  in  which  A,  B  and  C  are  known,  and  at  least  one  of 
the  trend  functions  is  known  to  be  absent. 

There  is  an  extensive  literature  on  nonparametric  estimation  for  the  drift  (or 
trend)  function,  5,  in  a  diffusion  process  satisfying  (2)  with  Z  as  a  Wiener  process; 
see  Ibragimov  and  Khasminski  (1980,  1981),  Geman  and  Hwang  (1983),  Nguyen 
and  Pham  (1982),  Beder  (1987),  McKeague  (1986) — who  allowed  Z  to  be  a  general 
square  integrable  martingale,  and  Leskow  (1989) — who  considered  the  case  of  a 
periodic  model.  These  authors  use  either  Parzen- Rosenblatt  type  kernel  estimators 
or  Grenander  (1980)  sieve  estimators  for  g,  but  those  estimators  are  not  directly 
applicable  to  the  present  setting,  unless  C  is  identically  zero  (in  which  case  only  g 
is  identifiable).  We  shall  find  that  there  is  a  function  /?,  related  to  g  and  /  through 
two  \'olterra  integral  equations,  and  h  can  be  estimated  by  kernel  or  sieve  type 
estimators.  Estimates  of  g  and  /  can  then  be  obtained  by  inserting  estimates  of  h 
or  its  first  derivative  h'  in  the  solutions  of  the  \^olterra  integral  equations. 

The  paper  is  organized  as  follows.  Section  2  contains  introductory  discussion 
concerning  the  basic  innovations  represent  of  the  observation  process,  identifi- 
ability,  bias  under  misspecified  trends,  and  ’  mes  (I)  and  (II).  Estimation  of  the 
trend  in  the  measurement  process  under  schemes  (I)  and  (II)  is  treated  in  Sections 
3  and  4  respectively.  In  Section  5  we  consider  estimation  of  the  trend  in  the  state 
process.  In  these  sections,  to  simplify  the  presentation,  we  assume  that  the  state 
and  measurement  processes  are  one-dimensional  (p  —  g  =  1).  Section  6  contains 
remarks  on  the  multi-dimensional  case.  In  Section  7  we  indicate  some  directions  for 
further  work. 

To  conclude  this  section  we  shall  briefly  put  our  problem  in  perspective  -vith 
other  inference  problems  for  stochastic  processes.  Statistical  models  for  stochastic 
processes  are  of  two  broad  types.  If  we  observe  a  process  V  —  {Yi,t  >  0)  and  we 
have  a  covariate  process  A'  =  (A’f,t  >  0)  to  incorporate  into  the  analysis,  then  we 
may  consider  a  pariially  specified  model  in  which,  loosely  speaking  (see  Greenwood 
(1988)  for  a  more  ^re^’ise  definition),  cn’y  the  cond’tional  distribution  of  T  given  A 
is  specified  in  terms  of  an  unknown  parameter  6.  .A.lternatively,  we  ma}'  know  the 
full  joint  distribution  of  (T,  A')  for  each  6,  in  which  case  we  have  a  fully  specified 
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model.  Partially  specified  models  are  especially  useful  and  widely  applied  in  the 
analysis  of  life  history  data  by  taking  K  as  a  counting  process  describing  the  times 
of  events  in  the  life  of  an  individual,  and  X  representing  a  covariate  process  specific 
to  the  individual — the  structure  of  the  marginal  distribution  of  X  being  unspecified; 
see  Arjas  and  Haara  (1984)  and  Andersen  et  £il.  (1988).  Fully  specified  models  on 
the  other  hand  are  widely  used  in  the  engineering  sciences  where  precise  models 
for  the  covariate  process  X  can  often  be  developed  from  well-understood  system 
dynamics.  Our  model  is  of  this  latter  type. 

Observation  schemes  may  similarly  be  classified  into  two  broad  types:  partial 
and  full.  In  the  survival  analysis  setting  partial  observation  may  arise  from  censor¬ 
ing,  truncation,  or  grouping  of  the  data,  see  Andersen  et  al.  (1988)  and  McKeague 
(1988).  It  arises  in  the  stochastic  systems  setting  when  the  state  of  the  system  is 
observed  in  the  presence  of  noise,  as  in  (1).  Despite  the  diverse  applications  of  such 
schemes  and  models,  there  is  a  siurprising  unity  to  the  techniques  used.  For  example, 
our  kernel  function  techniques  are  simileir  to  the  methods  used  by  Ramlau-Hansen 
(1983)  for  the  estimation  of  counting  process  intensities,  and  our  approach  to  the 
periodic  case  in  some  wa3's  resembles  that  of  Pons  and  de  Turckheim  (1988)  to  Cox’s 
periodic  regression  model. 

2.  The  innovations  representation 

We  shall  assume  throughout  that  the  functions  /, p,.4,B,C  and  u  are  smooth, 
and  C(f)  does  not  vanish  anywhere  on  [0,7’].  The  equations  for  the  Kalman-Buc}' 
filter  (see  Kallianpur  1980,  Section  10.3)  are 

AV(0  =  im + -Mo-yw +B(t)«(t)]dt +!>[/)  Mi) 

Mi)  =  dY(t)  -  [9(0  +  C(0-V(()]  A, 

where  A'(0)  =  in.  The  process  i/  is  the  so-called  innovaiions  process  which  is  known 
to  be  a  standard  Wiener  process.  The  function  D  is  the  Kalman  gain  which  in  the 
present  set-up  does  not  depend  on  /  or  g.  In  fact  D{i)  =  where  P  is  the 

unique  positive  solution  to  the  Riccaii  differential  equation 

P'(t)  =  2A(i)P{f)  -  C^t)P^(t)  -H  1.  (4) 

with  initial  condition  P{0)  =  Var(A'(0)).  From  (3)  we  have 

dX[t)  =  [A{t)  -  D{t)  C{t)]Xit)dt  -f  [/(f)  +  B{t)u{t)  -  D{t)g{t)]dt  +  D(t)dy(i). 

Using  Theorem  4.2.4  of  Davis  (1977)  we  can  solve  this  eqiiation  for  X .  Substituting 
the  solution  into  the  the  second  equation  in  (3)  we  obtain  the  following  innovations 


representation  for  Y: 


Y{t)=  [\h(s)  +  U{s)]ds  +  u{t), 

Jo 


(5) 


where  ^ 

h{t)  =  g{t) +  C(t)  f  <l>{t,s)[f{s)  -  D{s)g{s)]ds.  (6) 

Jo 

Here  is  the  solution  to  the  linear  time-varying  system 

=  [.4(()  -D(()C(()14'((,»),  I'Cs.s)  =  1 

and  U  is  given  by 
U{t)  = 

The  representation  (5)  will  be  of  prime  importance  in  the  sequel. 

Identifiability  of  f  and  g. 

We  see  from  (5)  that  the  function  h  is  identifiable  given  observation  of  )'  and 
U ;  however,  /  and  g  are  identifiable  only  in  so  far  as  they  are  uniquely  detennined 
in  terms  of  h  through  (6).  Thus,  the  functions  /  and  g  are  not  in  general  simulta¬ 
neously  identifiable  from  observation  of  Y.  However,  if  the  trend  is  absent  from  the 
measurement  process  {g  —  0)  then  (6)  reduces  to 

h{t)=  [\{t,s)f{s)ds,  (7) 

Jo 

where  =  C(t)T(t,s).  If  the  trend  is  absent  from  the  state  process  (/  =  0) 

then  (G)  reduces  to 

^'(0  =  5(0+/  r(t,s)5(s)(/.s.  (S) 

Jo 

where  r(t,  s)  =  —  C(t )  ^(t, s)  D{s). 

As  equations  involving  the  unknown  /  and  g,  (7)  and  (S)  are  liinur  Volierra 
integral  equations  of  the  first  and  second  kind  respectively.  It  follows  from  standard 
results  on  Volterra  equations  (see  Linz,  1985)  that  (S)  has  a  unique  solution  for  g. 
and  (7)  has  a  unique  solution  for  /  provided  C(t)  does  not  vanish  on  [0,  T].  Sinc^ 
h  is  identifiable,  the  trend  /  is  identifiable  when  <7  =  0.  and  g  is  identifiable  when 
/  =  0. 


C(t)|-L(t,0)m-h  ^'(t,s)[5(s)t/(s)ds  +  I>(s)dr(s)]}. 
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The  log -likelihood  function. 

The  inno\'ations  representation  (5)  allows  us  to  write  down  an  explicit  ex¬ 
pression  for  the  log-likelihood  function  L{h)  =  log[(f/v/,/<f/zu'](5^),  where  ///,  is  the 
measure  induced  on  C[0,  T]  by  Y,  and  nw  is  Wiener  measure.  By  Liptser  and 
Shiryayev  (1977,  Theorem  7.7)  we  have  that  jJh  <C  /^vv  and 


where  7r(s)  is  the  term  inside  the  square  brackets  in  (5). 

The  bias  caused  by  misspe.  ified  trends. 

What  is  the  effect  on  the  mean  square  error  of  the  Kalman  filter  (3)  of  using 
incorrect  trend  functions  f*  zfi  g*  fL  g"!  The  answer  to  this  question  should 
provide  us  with  a  modus  operandi  for  choosing  estimators  /  ,  y  to  be  used  in  place 
of  the  unknown  /,  g.  Let  denote  the  Kalman  filter  estimate  of  X{t)  based 

on  (3).  The  bias  caused  by  using  /*,  g*  instead  of  /,  g  at  time  t, 

BIAS(r.g\t)  = 

can  be  foimd  from  (3),  cf.  Jazwinski  (1970,  p.252), 

BIAS(r,g’,  t)  =  f  'I'((,s)|r(»)  -  !(s)  +  D(s)  (g(s)  -  j-(s))|  ds. 

JQ 

The  increase  in  the  mean  square  error  caused  by  using  estimators  f* ,g*  instead  of 
f.g  is  solely  due  to  this  nonrandom  bias  and  is  given  by  [BI AS{f* .g* . 

Observation  scheme  (I). 

The  processes  associated  with  the  fth  realization  are  given  the  subscript  i.  as  in 
Ui.  Ui,  Ui,  Li{h)  etc..  Note  that  although  the  observed  processes  {11,  i  =  1,. .  .  ,77} 
are  independent,  they  are  not  necessarily  identically  distributed  since  the  inputs  v, 
are  not  assumed  to  be  identical  for  each  ?.  However,  the  innovations  processes 
are  i.i.d.  Wiener  processes.  From  (5)  we  have 

Y,(t)=  I  [h{s)  +  U,{3)]<Is  +  u,{t).,  (10) 

Jo 

where 

U,{t)  = 

The  log-likelihood  function  is  given  by 

n 

i=l 


C(/)|4>(t,  0)  777  -h  Mt,s)lB{s)v,{s)(Is  + D{s)(iY,{s 


o 


Observation  scheme  (II). 

Scheme  (II)  can  be  treated  using  a  similar  framework  to  scheme  (I).  Let  hi,  Ui, 
Yi  and  i/j  be  the  following  restrictions  of  h,  U,  Y  and  u  to  the  i-ih  period; 

hi(t)  =  h{iT  +  t) 

Uiit)  =  U{iT  +  t) 

Yi(i)  =  Y(iT-l-t)-Y(iT) 
ui{t)  =  t/(iT  +  t)  —  ^{iT) 

^  ^  ST.  These  processes  satisfy 

Y,(i)  =  [\h,(s)  +  U,{s)]  ds  +  1^(0-  (10') 

Jo 

Since  i/  is  a  Wiener  process  (which  has  stationary  independent  increments),  the 
processes  2/,-,  i  =  1, . . . ,  77  are  also  i.i.d.  Wiener  processes  on  [0,  T].  The  log-likelihood 
function  given  by  (9)  with  with  T  replaced  by  nT,  can  be  written  as 

n 

t=i 

Note  that  the  function  h  is  not  periodic  so  that  hi  7^  h.  This  is  the  basic  difference 
between  schemes  (I)  and  (II),  making  (II)  much  harder  to  analyze. 


3.  Trend  in  the  measurement  process — i.i.d.  case 

In  this  section  we  consider  estimation  of  g  under  scheme  (I)  with  /  =  0.  First 
we  introduce  estimators  g  of  g  such  that  57.45(0,  g.i)  0  uniformly  in  /  as  77  — >  oc 
a.s..  In  fact,  a  foriiori,  g  will  be  shown  to  be  strongly  L^-consistent  in  the  sense 
that  ||y  -  .9||-^0  as  77  cc',  where  ||  •  ||  denotes  the  norm  in  L^[0,  T]. 

The  basic  idea  is  to  take  as  an  estimator  of  g  the  solution  g  of  the  ^’olterra 
integral  equation 

M0  =  ^(0+  /  T{1,s)g{s)ds.  (11) 

Jo 

where  h  is  an  estimator  of  h.  Note  that  the  estimator  so  obtained  is  well  defined 
since  (11)  admits  a  unique  solution  whenever  /?  6  L^[0,T] — see  Davis  (1977,  p. 
125).  Moreover,  should  h  be  a  strongly  L^-consistent  estimator  of  h,  the  following 
theorem  shows  that  g  is  also  strongly  L^-consistent. 
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Theorem  l.  Let  h  be  a  strongly  -consistent  estimator  of  h.  Then  the  solution 
of  the  Volterra  integral  equation  (11)  is  a  strongly  -consistent  estimator  of  g. 

Proof:  Let  h  =  h  —  h  and  g  =  g  —  g  be  the  estimation  errors  of  h  and  g  respectively 
and  denote  M  =  |r'(^,-s)|  <  oo.  Now  from  (8)  and  (11) 

g{t)  =  h{t)~  f  T(t,s)g{s)ds 
Jo 

so  that 

[  |9(i)p*  +  2|S(()l". 

Jo 

Using  Gronwall’s  inequality  (see  Kallianpur,  1980,  p.94)  we  then  have 

|g(<)P  <  ATM^  f  |/i(s)|2  exp[2TM^t  -  s)]  ds  +  2mt)\^ 

Jo 

<  ATM^  exp[2T^M^]  f  \h{s)\^  ds  +  2\h(t)\\ 

Jo 

Integrating  this  last  inequality  over  the  interval  [0,  T]  we  easily  get 

<  (2-{-AT\M^exp[2T\M^])\\hf. 

This  completes  the  proof.  □ 

Orthogonal  series  sieve  estimators  for  h. 

The  maximum  of  the  log-likelihood  function  is  not  attained  when  we 

maximize  over  the  whole  parameter  space  L^[0,T].  The  problem  is  that  the  pa¬ 
rameter  space  is  too  large  for  the  existence  of  the  unconstrained  maximum  like¬ 
lihood  estimator.  One  remedy  is  to  apply  the  method  of  sieves  which  consists 
in  maximizing  the  log-likelihood  function  over  an  increa.sing  sequence  of  subsets 
5„,  n  =  1,2,...  of  the  parameter  space.  We  shall  use  an  orthogonal  series  sieve 
5n  =  span{v'v,  =  l-  --,d„},  where  {d'r,  r  >  l)  is  a  complete  orthonormal 
sequence  in  L^[0,  T]  and  d„  — >  oo  as  i?  — ♦  oo. 

Let  the  coordinates  of  //  E  £^[0.  T]  with  respect  to  the  basis  {l'v-  >  1}  be 

denoted  (hr,  r  >  1)  and  denote  the  vector  (/?], ....  )'  by  Then,  omitting 

terms  not  involving  h.  for  h  E  5„ 

=  h<''>'(Q<’'^  -  P'"’)  -  ^h'''>'h'"^  (12) 

where  and  P*”^  are  d„  x  1  vectors  with  components 

=  E  /  vv(r)rfi'(() 

,=  1  ■'0 

=  h'r{t)U,(t)di. 

,  =  1  Jf' 


t 


Maximizing  (12)  with  respect  to  we  obtain 


dn 


Hi)  =  V'v(0 

r=l 


where  =  [^j, . . . ,  is  given  by 


(13) 


(14) 


Theorem  2.  Suppose  that  d„  oo  and  d„/n  — »  0  as  n  — >  oc.  Then  the  orthogonal 
series  sieve  estimator  h  given  by  (13)  is  a  strongly  -consistent  estimator  of  h. 

Proof;  It  suffices  to  show  that  ||lV"^  —  h^"^||^'0,  where  ||  ■  ||  can  also  denote  the 
euclidean  norm,  depending  on  the  context.  By  (10)  and  (14)  the  rth  component  of 
Ii(")  _  h(")  is  given  by 

(h (n)  _  J^(n))^  = 


where 


for  r  =  1 . . . . ,  dn .  Thus 


di/,{f). 


'  r=l 

Now  r  =  l,...,dn  are  i.i.d.  A"(0,1)  r.v.’s  s-'  that  7?||h*"'  —  h^"*||‘  has  a  \* 
disuibutioii  with  d„  degrees  of  freedom.  The  proof  is  now  completed  vising  the 
Bold -Cantelli  type  argument  given  by  Beder  (19S7.  Section  5).  □ 

TcJiiark.  The  rate  c/„  =  o(n)  is  the  best  possible  for  Z^-consistency  of  the  orthogonal 
series  sieve  estimators,  cf.  McKeague  (19S6)  and  Beder  (1987). 

Ktrn(.l  c.dniiators  for  h. 

Let  K  be  a  bounded  kernel  function  having  integral  1,  support  [—1.1]  and  let 
hn  >  0  be  a  bandwidth  parameter.  Define 


h(i) 


^71  Jo 


t  —  s 


dH{s). 


1  ”  /■< 
*  i=l 


(15) 

(16) 
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Here  H{i)  estimates  the  function  H{t)  =  J*  h{s)ds. 

Theorem  3.  Suppose  that  b„  —*  0  and  +  oo  for  some  0  <  6  <  1.  Then 

the  kernel  estimator  h  given  by  (15)  Is  a  strongly  L^-conslstent  estimator  of  h. 

Proof;  First  note  that  since  h  is  continuous,  —  /jjj  — >  0  where  is  the 

following  smoothed  version  of  h 


It  remains  to  show  that  ||A  —  From  (15) 

where 

and  ir*")  =  y/n[H  —  H).  It  follows  from  (10)  and  (IG)  that  is  a  standard 

\^’iencr  process  for  all  n.  Thus  e^"^(t)  is  Gaussian  with  mean  zero  and  variance 

Fix  v/  >  0  and  let  k  >  1/6.  Applying  Holder’s  inequality  on  [0.  T].  Fubini's  Theorem, 
and  noting  that  the  2A--th  moment  of  €^"^(/)  is  uniformly  bounded  in  ?)  and  t  we  get 

T 

E\\h  -  f 

Jo 

=  0((rW., ,)-'■) 

=  0(77“*^). 

By  Chebyshev's  inequality 

P(||/7  -  //‘"’ll  >  7/)  <  7/-2*£||/,  _  /,(’')||-‘  =  0(7/-*') 
and  since  kS  >  1  we  have 


oo 

VP(j|/,  -  >  7/)  <  CV 

n  =  l 


for  all  7]  >  0.  The  Borel-Cantelli  lemma  gives  ||//  -  /(‘"'[|‘bio.  □ 


Asymptotic  distribution  results  for  estimators  of  g. 

Let  7(t,5)  be  the  resolvent  kernel  for  r(t,s),  so  the  unique  solution  of  (8)  is 
given  by  ^ 

g(t)  =  h(t)+l  'r{t,s)his)ds,  (17) 

Jo 

see  Linz  (19S5,  Theorem  3.3).  Note  that  h(-)  may  be  considered  as  the  output  of 
the  linea,r  system 

.-'(f)  =  1.4(f)  -  C(f)D(f)l.-(f)  +  D{t)g{1),  2(0)  =  0 
/i(f)  =  s(f)  -  C(f)2(f). 

After  a  trivial  manipulation  i.his  may  be  written  as  a  linear  system  with  input  h{  ) 
and  output  g{-): 

z'(t)=A{t)z{t)  +  D{t)h{t),  r(0)  =  0 

g{i)  =  C(t)z(t)  +  h{t). 

So  we  n;ay  identify  the  resolvent  kernel  7  a 


7(f,s)  =  C(f)'I'^(f,s)D(s), 


(IS) 


where  a  is  the  transition  function  of  the  system  x'{i)  =  A{i)x(t). 

Let  the  estimator  of  g  corresponding  to  h  be  denoted  g,  so  that  g  is  the  solution 
of  the  Volterra  integral  equation 

K0  =  5(0+  /  T{t,s)g{s)  ds. 

Jo 

Now  we  can  write  g  explicitly  as 

g{t)  =  h{t)+f  7(t,s) /;(s)ds.  (19) 

Jo 

Our  next  result  makes  use  of  (19)  to  derive  the  a^ymptotic  distribution  of  g. 


Theorem  4.  Suppose  that  7ib„  —y  00  and  ribl,  — >  0.  Then  for  each  0  <  t  <  T 

inK)H9{t)  -  K')- 


where  K^{u)du. 

Phooe:  From  (17).  (19)  and  the  proof  of  Theorem  3  we  have 
i^d,„)hgit)  - 17(0)  =  ^^"Ho  +  (77t„)i(M">(/)  -  hit)) 

+  bii/^\t)  +  {nb„)^  f  -,{t.s){h^'^Hs)-his))ds. 

Jo 


(20) 
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where 


,(n) 


(0 


-  /'  f 

Jo  Jo 


'lit,  s)  K 


S  —  V 
~ 


dW<'^'>iv)ds. 


The  first  term  on  the  right  hand  side  of  (20)  is  Gaussian  with  mean  zero  and 

variance 


so  that  >A’’(0,  K^).  The  remaining  terms  tend  to  zero  in  probability.  For, 

using  a  Fubini  theorem  for  stochastic  integrals  (see  Liptser  and  Shiryayev,  1977, 
Theorem  5.15)  we  have 


so  that  f/^"^(t)  is  Gaussian  with  mean  zero  and  variance 

Hi 

Thus  the  third  term  on  the  right  hand  side  of  (20)  is  of  order  Op{\/b„).  Since  h  is 
Lipschitz,  the  second  and  fourth  terms  are  of  order  0{y/nb^).  □ 

>ln  alternative  estimator  for  g  based  on  the  resolvent  equation. 

An  equivalent  way  of  writing  equation  (17)  is 

g{t)^h{t)+  f  ~,(t.s)dH{s).  (21) 

Jo 

.so  an  altcina^ve  estimator  for  g  is 

cf{i)  =  //(/)+  f  -,it.s)dH{s).  (22) 

Jo 

\v-’i'.’re  h  and  H  arc  given  by  (15)  and  (lO).  Xot  surprisingly.  9“(/)  has  the  same 
asymptotic  distribution  as  g. 

TlirOHL.M  5.  Suppose  that  nb„  — >  oc  and  nlil,  — >  0.  Then  for  each  0  <  t  <  T 

inb„)U</(t)-g{i))^y{0.n^-). 
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vrhere  =  X!.j  K^{u)du. 

Proof:  From  (21),  (22)  and  the  proof  of  Theorem  3  we  have 

(n(.„)i(j“(i)  -  g(t))  =  £<”'(()  +  (n6„)4(A'">(<)  -  h(t)) 

+  bi  [ 

Jo 

By  the  proof  of  Theorem  4  the  first  term  c^"^(t)-^A^(0,  /c^).  The  second  term  is  of 
order  0{y/nb^),  as  in  the  proof  of  Theorem  4.  Since  the  processes  Tt  n  >  1  are 
standard  Wiener  processes,  the  last  term  is  of  order  Op(\/6n).  □ 


4.  Trend  in  the  measurement  process — periodic  case 

In  this  section  we  consider  estimation  of  g  under  scheme  (II)  with  /  =  0  and 
the  functions  g,A,B  and  C  assumed  to  be  periodic  with  period  T.  We  need  some 
preliminary  results. 

Proposition  6.  (Bittanti  et  al.,  19S4).  If  the  pair  (-4(-),l)  is  completely  control¬ 
lable  and  the  pair  (.4(-),C(-))  is  completely  observable  then  there  exists  a  unique 
positive  T -periodic  solution  P  to  the  Riccati  differential  equation  (4).  Moreover, 
'i',  obtained  by  replacing  P  by  P  in  the  definition  of  is  uniformly  asymptotically 
stable,  i.e.  there  exist  positive  constants  A'l  and  A'2  such  that 

|'i'(A.s)|  <  A'l  exp[— A'2(t  —  s)],  for  all  s  <  t. 

In  the  scalar  case  (p  =  g  =  1),  the  pair  (A(-).l)  is  always  completely  control¬ 
lable,  and  the  pair  (.4(-),C(-))  is  completely  observable  under  our  assumption  that 
C(  )  never  \'anishes  on  [0,  T],  see  Rubio  (1971,  Chapter  5).  Thus  Proposition  C  can 
be  applied  directly  in  that  case.  Anyway,  the  hypotheses  of  Proposition  6  arc  ^•ery 
natural  in  the  context  of  linear  systems  (see  Rubio,  1971,  Chapter  5). 

We  shall  need  the  following  assumption: 

(A)  ^\^(i,s)ds  <  CO  and  ^'^^(t,s)c/s  <  OC'  for  all  t  G  [0,7]. 

Now  introduce  tlie  function 

^'00(0=5(0+/  r(/,s)g(s)d.s  (23) 

J  —  OQ 

where  r{f,s)  =  —C{t  )^’(t ,  s)C(s)P{s).  Also  define  ^{t,s)  =  A{t ■  f^)C{s)P{s). 

The  following  lemma  shows  that  there  is  a  useful  analogy  to  the  important  repre¬ 
sentation  (17)  in  the  periodic  case,  with  the  functions  hoc  and  2  Playing  similar 
roles  to  h  and  7. 
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Lemma  7. 

(a)  /Joo(-)  is  T-periodic. 

(b)  There  exists  a  positive  constant  i?  such  that  sup<g[o,T]  l^^(0~^oc(0l  =  0(e~^') 

as  i  — +  oo. 

(c)  If  f^^'i>A(t,s)ds  <  oo,  then 


g(t)  =  hoo 


y(t,^)hoc(s)ds,  t  e  [0,T]. 


(24) 


Proof:  From  the  standard  theory  of  linear  O.D.E.’s 

^(t,s)=exp|  J  [.4(u)  —  C^(u)P(u)]  du| 

so  that,  by  the  periodicity  of  A,C  and  P,  ^  has  the  property 

I'(tT  -\-t,iT  +  s)  =  ^(t,s)  for  all  s  <i.  (25) 

This  property  also  holds  for  P.  Part  (a)  then  follows  using  the  periodicity  of  g. 
Next,  letting  h,  be  defined  by  replacing  P  by  P  in  the  definition  of  /?,,  we  have 

^i(0  =  ^('^  +  t)+  f  T(iT  +  t,  iT  -f  s)g{iT  -f  s)  ds 

J-iT 

=  9(0+  f  t'{hs)g{s)ds 
J-iT 

=  ^;oc(0-/  t{i,s)g{s)ds.  (2G) 

— OO 

Note  that  g{s)T{i.s)  is  uniformly  asymptotically  stable  by  Proposition  G  and  the 
boundedness  of  g  and  C.  Thus  by  (2G)  and  elementary  integration,  sup,gjo  l^'i(0~ 
/r..(y)l  =  0(e-«')as.-  — >  oc.  Here  and  in  what  follows.  R  denotes  a  generic  positive 
constant  which  does  not  depend  on  T  and  which  may  change  from  use  to  use.  To 
complete  the  proof  of  (b)  we  need  to  show  that  sup,g[o.7] 
as  i  —>  OC'.  Now, 


\h,{t)-h,{i)\<  /  \r{iT  +  i,s)-T{iT  +  i.s)\d-. 

Jo 

nT-\-i 

<  0(1)  /  |d'(/T  +  t,  s)  —  TfiT  + /,  s)|  f/s 

Jo 

•  fT+f 


+  0(1) 


/ 


'l'(?T  +  t.s)|P(.s)  -  P(.sljds,  (21 
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since  P(-)  is  bounded  by  Roitenberg  (1974,  p.425).  The  first  term  in  (27)  is  bounded 
above  by 


0(1)  V  r  + 

+  0(1)  r*'m  +  t,s)  _  1 

JiT 


ds. 


Use  asymptotic  stability  of  ’I'  to  bound  the  sum  of  the  first  [i/2]  terms  above  by 
0(e“‘-^).  We  can  also  bound  the  sum  of  the  remaining  terms  by  0{e~'^)  as  follows. 
Writing  Pi(u)  =  P(iT  4-  u)  for  0  <  u  <  T,  use  the  rate 

sup  \Pi(u)  -  P(v)\  =  (28) 

ue[o,r] 


given  by  Roitenberg  (1974,  Theoreme  6,  p.43l),  to  obtain  for  s  6  [(r  —  1)T,  rT] 


fiT+t  _  «+i  [-T 

/  ~  PMld"  <  0(1)  T  /  \P(v)-Pj(v}ld„ 

j  =  r-l 


i+1 


<  0(1)  ^  =  Ofe-^Oi 

j=r-l 


uniformly  in  Then,  also  using  the  inequality  [e^  —  l|  <  3|a'|  for  |i|  <  1.  the  sum 
of  the  “remaining  terms”  above  has  the  form  XIr=|i/2]+]  0(f.“'^”)  =  0{e~^').  as 
required.  The  last  term  in  (27)  is  treated  in  a  similar  fashion.  This  proves  (b). 

Under  the  hypothesis  of  part  (c),  the  kernel  7  satisfies  ■  s)  da  <  oc-. 

Also  note  that  7  satisfies  the  property  (25).  Thus,  since  g  is  T-pcriodic. 

g{i)  =  g{iT  +  t)  =  h,{i)  +  f  ')(tr  + /,7T  + s'/Zofslf/.s 

J-iT 

—  h,{i)+  /  ~i[i  ,B)hi{s)  ds  +  0[\)  /  'i' A{f .  s)\P,{^‘^)  —  P(p^)\ ds 

J-tT  J-,T 

-^hocii)+f  i{U^Vioc(f-)ds, 


as  i  — +  00,  by  the  dominated  convergence  theorem,  part  (b)  of  the  lemma,  and  (2S). 
This  proves  (c).  □ 

With  the  help  of  Lemma  7  it  is  now  possible  to  develop  results  analogous  to 
tliO.se  of  Section  3.  For  the  purposes  of  illustration  we  shall  discuss  kernel  estimators. 


14 


Define  the  kernel  estimator  hoo  of  hca  to  be  the  T-periodic  function  coinciding  with 
h  given  by  (15).  Then,  in  view  of  (24),  it  is  natural  to  estimate  g  by 


i: 


9it)  =  hoo(t)+  y(i,s)hoo(s)ds,  t  e  [0,T]. 


(29) 


Theorem  8.  Suppose  that  (A)  holds.  Then  the  entire  statement  of  Theorem  4 
carries  over  to  the  periodic  case,  giving  the  asymptotic  distribution  of  the  estimator 
g  defined  by  (29). 

Proof:  The  proof  iS  veiy^  similar  to  the  scheme  (I)  case.  Use  (24)  and  (29)  to 
obtain  a  periodic  version  of  (20): 


(nb„)i(m-gW)  =  «'"*(()  +  ("in)* ('<;(<)  -  h^{t)) 

+  +  [  y(i,3)(b’„{s)-bUs))ds. 

J  —  oo 


(30) 


where 


Jo  J 


ds,  f€[0,T] 


/i*  is  the  T-periodic  extension  of 


1 


t  -  . 


7? 


T=] 


/e[o.T] 


to  the  whole  real  line,  and 


=  (Sr) 


It  follows  from  (10')  and  (IG)  that  ir''0  is  a  standard  Wiener  process  for  all  v.  so 
that,  as  in  the  proof  of  Theorem  4.  €'"\f)^JV(0,  k^).  By  Lemma  7  (b) 


sup 

^6[o,71 


n 


=  0(77-’) 
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so  that,  since  h^o  is  Lipschitz,  the  second  term  on  the  right  hand  side  of  (30)  is  of 
order 

inbn)HK(i)  -  hoo(t))  =  O(^)  "  +  O(v^). 

Using  Condition  (A)  it  can  be  shown  that  the  last  term  on  the  right  hand  side  of 
(30)  is  of  the  same  order.  Using  Condition  (A)  again,  the  third  term  can  be  shown 
to  be  of  order  Op{\/K^),  as  in  the  proof  of  Theorem  4.  □ 


5.  Trend  in  the  state  process 

Throughout  this  section  it  is  assumed  that  the  trend  in  the  measurement  pro¬ 
cess  is  zero.  We  shall  introduce  an  estimator  /  of  the  trend  /  in  the  state  process 
such  that  BIAS(f,0^t)  — >  0  uniformly  in  t  as  n  — +  oo  a.s..  In  fact  /  is  shown  to 
be  strongly  T^-consistent.  We  shall  only  consider  the  case  of  observation  scheme 
(I)  since  our  results  can  be  extended  easily  to  scheme  (II)  along  the  lines  that  we 
extended  our  results  on  estimation  of  g  in  Section  4. 

To  estimate  /  we  need  to  consider  (7),  which  is  a  linear  \  olterra  integral  equa¬ 
tion  of  the  first  kind.  The  usual  way  to  deal  with  such  equations  is  to  convert  them 
into  \''olterra  equations  of  the  second  kind  by  differentiation,  see  Linz  (19S5,  p.G7). 
In  fact,  using  this  technique,  we  may  solve  (7)  explicitly  for  /.  Since  C{t)  does  not 
vanish  on  [0,  T],  we  obtain 


m  = 


m 

Cit) 


+  Fit)hii) 


(31) 


where 


F{i)  =  D{i)  - 


C'ji) 

CHi) 


A{i) 

C(i)' 


Thus  the  problem  of  estimating  /  is  similar  to  the  problem  of  estimating  g.  e.xcept 
that  now  we  need  to  estimate  h'  as  well  as  h.  We  shall  onlj-  consider  kernel  es¬ 
timators  of  h' ,  although  trigonometric  series  sieve  estimators  (see  IlDiagimov  and 
Khasminski,  19S0)  could  equally  well  be  used. 

Let  h  be  a  kernel  function,  as  in  Section  3,  but  in  addition  assume  that  K  is 
differentiable.  Let  c„  be  a  bandwidth  parameter,  different  from  h„.  Define 


h'{t)  = 


The  following  result,  stated  without  proof,  is  similar  to  Theorem  3. 


(32) 


IG 


Theorem  9.  Suppose  that  Cn  0  and  Cnn^  *  — ♦  oo  where  0  <  S  <  j.  Then  the 
kernel  estimator  h'  given  by  (32)  is  a  strongly  -consistent  estimator  of  h' . 

In  view  of  this  result  and  (31)  it  is  reasonable  to  estimate  /  by 

m  =  ^ (23) 

where  h,  given  by  (15),  is  the  kernel  estimator  of  h.  Under  the  joint  conditions  of 
Theorems  3  and  9  we  see  that  /  is  a  strongly  L^-consistent  estimator  of  /.  Finally, 
we  give  an  asymptotic  distribution  result  for  /. 

Theorem  10.  Suppose  that  nb„  oo,  nb^  — >  0,  nc^  -+  oo,  nc®  —*  0  and  c„  = 
o(bn ).  Then  for  each  Q  <  t  <  T 


where 


(nci)^(f(i)~f(t))^N(0,a^(t)), 


a^(t) 


Ill  K'{u)^du 

(cm 


Proof;  Directlj'  from  (31)  and  (33) 

i’OHAt)  -  m)  =  - ;,'(()) 

It  can  be  shown,  using  a  similar  approach  to  the  proof  of  Theorem  4.  that  the  first 
term  on  the  right  hand  side  tends  in  distribution  to  .V(0,  a^(t)).  Also  from  the 

I 

proof  of  Theorem  4,  and  using  the  condition  c„  =  o{hi',  ),  the  second  term  on  tlie 
right  hand  side  is  seen  to  be  of  order  op(l).  □ 


6.  The  multivariate  case 


In  the  general  case  in  which  the  state  and  measurement  processes  are  p  and 
(y-dimensional,  our  results  are  modified  in  obvious  ways  to  take  into  account  the  fact 
that  A,  B,  C  etc.  are  matrices.  The  innovations  process  n  is  now  a  ^-dimensional 
Wiener  process  and  (4)  is  replaced  by  the  matrix  Riccati  equation 


P'(i)  =  A(t)Pit)  +  Pit)Aiif  -  Pit)CitfCii)P(i)  +  I. 


with  initieJ  condition  P(0)  =  co\'axiance  matrix  of  A'’(0).  Here  I  is  the  p  x.  p 
identity  matrix,  and  denotes  “transcript.”  The  Kalman  gain  is  now  given  by 
D(t)  =  P(t)C(t)T. 

In  Section  3  the  ^-dimensional  version  of  the  orthogonal  series  sieve  estimator 
h  is  defined  (using  the  same  sieve  for  each  component  of  g)  by 

d„ 

^kii)  = 

r=l 

fc  =  1, . . . ,  g,  where 

=  -T  r -  u,u{t)dt). 

The  kernel  estimator  h  is  defined  (using  the  same  kernel  function  and  bandwidth  for 
each  component  of  g)  by  the  g-dimensionaJ  version  of  (15).  The  estimators  g  and  g 
are  defined  as  before.  Theorems  1-5  extend  with  the  modification  that  the  limiting 
distribution  in  Theorems  4  and  5  is  A’(0,  k^I).  In  the  proofs  of  these  results, 
becomes  a  g-dimensional  Wiener  process. 

For  the  results  of  Section  4  to  hold,  the  additional  assumptions  that  (.4(-),  I)  is 
completely  controllable  and  (.4(-),C(-))  is  completely  observable  are  needed.  Con¬ 
dition  (A)  becomes 

(A)  jLoc  ll^>4(^5)ll  <  oo  and  p/^(t,  s)||2  ds  <  oo  for  all  t  €  [O.  T]. 

Here  ||  •  1|  denotes  operator  norm.  There  is  essentially  no  change  in  the  proofs,  with 
the  results  of  Bittanti  et  al.  (19S4)  and  Roitenberg  (1974)  being  applied  in  the  same 
way  as  before. 

The  results  of  Section  5  extend  under  the  condition  that  for  each  i  £  [0.  T]  the 
matrix  C{t)  has  a  left  inverse  This  will  be  the  case  if  p  <  g  and  C(f)  has 

column  rank  p  for  each  t  €  [O.T].  Then  (31)  becomes 

wliere 

Fit)  =  Dit)Cit)C-\t)  -  C-\t)C'it)C-\i)  -  Ait)C-Hi). 

showing  that  /  is  identifiable.  Note  that  /  is  not  identifiable  if  p  >  g.  The  limiting 
distribution  in  Theorem  10  becomes  A*(0, !)(/)),  where 

v(/)  =  C-\i)C-\i)'^  j  K'ivYdu. 


IS 


7.  Directions  for  further  work 


The  techniques  and  results  developed  in  this  article  are  by  no  means  exhaustive. 
We  are  aware  of  many  important  questions  concerning  the  problem  of  nonparametric 
inference  for  linear  systems  in  continuous  time  for  which  we  have  no  answer  at  this 
stage.  We  conclude  by  listing  some  of  these  questions,  the  first  two  of  which  were 
mentioned  by  a  referee. 

(1)  Is  it  possible  to  weaken  the  assumption  that  A,  B  and  C  be  known?  How 
far  would  the  analysis  go  say,  if  B  was  unknown?  (This  would  require  an 
assumption  of  sufficient  variability  in  the  deterministic  inputs  Ui,  i  >  1  to  avoid 
an  identifiability  problem.)  In  the  same  line,  how  robust  are  the  estimators  of 
/  and  g  to  the  specification  of  A,  B  and  C? 

(2)  Can  anything  be  said  about  the  optimal  choice  of  the  bandwidth  in  the  kernel 
estimators?  In  the  cases  of  density  estimation  and  nonparametric  curve  esti¬ 
mation  there  are  various  techniques  for  automatically  selecting  the  bandwidth. 
It  ought  to  be  possible  to  develop  such  methods  of  ‘cross-validation’  here. 

(3)  Can  a  test  for  detecting  the  presence  of  a  trend  (e.g.  a  test  of  <7  ^  0)  be 
developed?  More  generally,  it  is  of  interest  to  test  of  whether  the  trend  is  of 
some  specified  form.  As  in  the  case  of  goodness-of-fit  testing  for  distribution 
functions,  this  might  be  done  by  deriving  a  functional  central  limit  theorem  for 
an  estimator  of  the  cumulative  trend  function  G(-)  =  f^gis)  ds. 
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